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ABSTRACT: Augmented reality (AR) still struggles to be widely used in real processes in the construction industry 
despite its great potential. This is partly due to the difficulties that exist in aligning holograms and maintaining 
their stability, especially for outdoor applications. In addition, being indoor-outdoor interactions crucial for built 
environment management, it would be important that AR apps can work seamlessly. Alignment in indoor 
environments cannot make use of methods such as GNSS, nor can all environments be assumed to have been 
previously initialized with AR tools. Thus, marker-less AR registration is crucial for indoor applications. This 
paper presents an approach for marker-less AR registration seamlessly in both outdoor and indoor environments. 
Real-time kinematic positioning (RTK) and Inertial Measurement Units (IMU) technologies have been chosen for 
outdoor registration, while image comparison based on convolutional neural networks (CNN) for indoor 
registration. In this research, the application of these two technologies and their integration have been studied 
and tested on site on a real Facility Management use case related to a university campus. The proposed approach 
has shown very promising results in displaying BIM elements of the electrical system seamlessly superimposed 
through AR to their physical counterparts in mixed indoor-outdoor environments. 
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1. INTRODUCTION 


The AECO industry, although commonly recognized as one of the least digitized industries, is increasingly moving 
towards embracing more and more computer-based technologies to provide better performance in various stages 
of buildings lifecycle (Albahbah et al., 2021). The Operation and Maintenance (O&M) phase of Facility 
Management (FM) accounts for the largest proportion of the whole life costs of the building process (Salman & 
Ahmad, 2023). The costs of the O&M phase represent 50-70% of the total annual facility operating costs and 85% 
of the entire lifecycle cost of a building. Ever since the facilities started to become more complex, the day-to-day 
tasks have also become more difficult. In fact, the increased need of the construction industry for visualization 
technologies arises from the complex nature of the industry and its high demand for information access for 
assessment, communication, and collaboration. Lack of coordination between facility managers and field workers 
results in delays and cost overruns which could easily be avoided with better coordination and visualization tools. 
In this domain, Augmented Reality (AR) technologies can be used as visualization tools for facilities’ O&M tasks 
and can provide significant advantages. AR impacts the mobile computing industry by radically changing the type 
of interaction between humans and computers. In fact, such technology creates direct, automatic, and practicable 
connections between the physical world and digital information by providing a simple and immediate user 
interface to a digitally enhanced physical world. Since AR allows virtual objects to be simultaneously 
superimposed on the real world, it helps to locate and view the occluded facilities and equipment and provides 
maintenance guiding instruction for the field workers (Salman & Ahmad, 2023). Through a hand-held device 
(HHD) or head mount device (HMD), it augments the real world by making implicit information apparent to the 
user when required. Many journal publications demonstrate that AR technology can be applied to various domains 
in AECO industry, especially in the O&M phase of the project lifecycle (Baek et al., 2019; Jurado et al., 2021; 
Naticchia et al., 2021; Vaccarini et al., 2022). For example, current maintenance practices are characterized by 
scattered and disoriented facility information that the maintenance staff must fetch through specifications, 
maintenance reports, and checklists. In fact, 50% of the on-site maintenance time is still spent on localizing and 
navigating targets inside a facility (Salman & Ahmad, 2023). Even after locating the target, maintenance staff must 
put additional effort into seeing the target as it could be concealed in the case of piping, overhead ducts or behind 
a wall. 


AR poses a number of demanding technological requirements for its implementation (Costanza et al., 2009). One 
challenge is related to display technology, which has registered remarkable breakthroughs in the last decades. 
Precise position tracking constitutes another significant challenge. In order to give the illusion that virtual objects 


Referee List (DOI: 10.36253/fup_referee_list) 
FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup_best_practice) 
Leonardo Messi, Francesco Spegni, Massimo Vaccarini, Alessandra Corneli, Leonardo Binni, Seamless Indoor/Outdoor Marker-Less Augmented 


Reality Registration Supporting Facility Management Operations, pp. 109-120, © 2023 Author(s), CC BY NC 4.0, DOI 10.36253/979-12-215-0289- 
3.11 


are located at fixed physical positions or attached to physical items, the system must know the position of relevant 
physical objects relative to the display system. Since the earliest days, spatial registration has been considered as 
one of the most important technical aspects of an AR systems and is considered a core part of AR functionality 
(Albahbah et al., 2021; Salman & Ahmad, 2023). Spatial registration can combine virtual objects and the real 
environment with the correct spatial perspective relationship by calculating the corresponding relation of both the 
virtual world and the real-world coordinate systems (Cheng et al., 2020). More in detail, spatial registration is 
responsible for calculating the user's correct spatial position and orientation in accordance with the real-world 
coordinate systems (Albahbah et al., 2021). 


The spatial registration methods are generally classified into two categories: “marker-based” and “marker-less” 
methods. The first one is considered the most widely used spatial registration method. Markers can be 2D images 
with visual features or natural 3D objects in the real environment (Cheng et al., 2020). The high use of this method 
may be returned to the simplicity, efficiency, and convenience of image recognition for superimposing virtual 
objects to the real world. Image recognition methods rely on extracting features from images instead of using 
complicated algorithms for calculating the relationship of relative positions. A similar approach can be applied 
with invisible markers, such as infrared and RFID ones (El Barhoumi et al., 2022). With marker-less approaches, 
instead of tracking features of markers, localization technologies are used to control the relative position between 
the real environment and virtual objects. GNSS is the most popular marker-less localization technology due to its 
suitability for use in a large open area such as a construction site and the ease of its signal receiving by common 
mobile devices (Cheng et al., 2020). According to the official US Government information on GNSS, the user 
range error (URE) for civil commitments cannot reach lower than 0.8 m (Cheng et al., 2020). The localization 
accuracy of GNSS is much worse in indoor environments, given that the buildings block the GNSS signal. The 
low accuracy of GNSS is not suitable for activities that require high accuracy or that mainly occur indoors. 
Compared with GNSS, some other marker-less localization technologies, such as Wi-Fi, Ultra-wideband (UWB), 
Inertial Measurement Unit (IMU), and Simultaneous Localization and Mapping (SLAM) can provide higher 
accuracy and can be applied to indoor activities. Another marker-less category is represented by vision-based 
methods using natural features for registration purposes (El Barhoumi et al., 2022). Each of these methods for 
spatial registration has its limitations in either accuracy or practicality. To promote the application of AR in the 
FM, which typically involves both indoor and outdoor environments, an advanced localization method that can 
provide an accurate and seamless registration in heterogeneous scenarios is needed. In order to cover this gap, a 
marker-less localization system for seamless indoor/outdoor AR registration has been developed by defining a 
cloud platform that hosts an indoor registration engine, an outdoor registration engine, plus a switch engine that 
manages the priority between the two. The developed system, tested on site on a real FM use case related to a 
university campus, has shown very promising results. The remainder of this paper is structured as follows. In 
Section 2, a literature review is presented. Section 3 reports the methodology adopted for the development of the 
proposed system. In Section 4, experiments design and execution on a FM use case are presented. Finally, Section 
5 is devoted to results discussion and conclusions. 


2. LITERATURE REVIEW 


In this section, a literature review concerning existing AR registration methodologies applied to both indoor 
(Section 2.1) and outdoor (Section 2.2) environments is reported. Understanding strength and eventual gaps of 
approaches proposed by past studies and commercially available solutions paved the way to the definition of the 
indoor/outdoor seamless registration system proposed by this study. 


2.1 Indoor AR registration 


In the AECO industry, several AR registration methodologies for indoor applications have been developed and 
tested so far. Past studies have exhaustively tested marker-based approaches using visual markers distinctive in the 
scene (Lee & Akin, 2011; Park et al., 2013). Even though artificial markers are advantageous in terms of robustness 
in detection, they should be installed all over the facility before on-site activities, such as the actual FM, occur. In 
addition, visual markers can trigger aesthetic issues because of their distinctive appearance. Alternative solutions 
are represented by invisible markers, such as infrared (Kuo et al., 2013) and RFID (Carbonari et al., 2022; 
Naticchia et al., 2021), and natural markers (Koch et al., 2014) that do not aesthetically change the scene. However, 
even though invisible markers do not have aesthetic issues, they should be pre-installed. Natural markers, instead, 
have the limitation of depending on signs, including exit signs, fire extinguisher signs, and textual information 
signs. If the scene does not have such designated signs, the localization can be restricted (Baek et al., 2019). 
Commercial AR libraries, such as Vuforia (PTC Products, 2023), ARcore (Google LLC, 2023), and World Locking 
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Tools (WLTs) (Microsoft, 2023) have been tested in indoor environments (Ashour et al., 2022; El Barhoumi et al., 
2022). Comparative tests in indoor environments of Vuforia Image Target and WLTs showed better results of the 
second ones (Teruggi & Fassi, 2022). 


Among marker-less methods, GNSS-based AR systems have been largely studied (Kim et al., 2013). However, 
they are considered inappropriate for indoor applications because of their low accuracy (Chen et al., 2019). 
Therefore, many studies have employed the Wi-Fi fingerprinting technology for indoor localization purposes 
(Ahmad et al., 2020; Chen et al., 2019). This approach loses accuracy in the case of multiple mobile devices. In 
fact, the localization accuracy of the order of 1 m, ensured by Wi-Fi-based collaborative systems (Chen et al., 
2019), can be still improved. Another marker-less methodology for indoor localization is based on image 
comparison. Image-based localization is classically tackled by estimating a camera pose from correspondences 
established between sparse local features (Ethan Rublee et al., 2011) and a 3D Structure-from-Motion (SfM) 
(Schönberger & Frahm, 2016) map of the scene (Li et al., 2012). Image comparison methodologies are classified 
into direct-matching and image-retrieval methodologies (Baek et al., 2019). Direct-matching methodologies do 
not render images for dataset, but directly find correspondences between 3D structure and the queried image 
(Humenberger et al., 2020). This pipeline scales to large scenes using image retrieval (Cao et al., 2020). Image- 
retrieval methodologies attempt to find the closest image to the queried image among the preliminarily prepared 
dataset. The dataset images can be preliminarily collected by photographs or rendered from three-dimensional 
structure estimation. Recently, many of these steps or even the end-to-end pipeline have been successfully learned 
with neural networks (DeTone et al., 2017; Lindenberger et al., 2023; Sarlin, Cadena, et al., 2018). This approach, 
although may lose accuracy whenever there is lack of context or repetitive elements, has shown great potential to 
develop indoor AR registration apps with applicability for non-expert users (Baek et al., 2019). 


2.2 Outdoor AR registration 


AR registration in outdoor environments presents distinct challenges compared to those encountered in indoor 
environments. Tracking and alignment approaches such as SLAM enable the placement of virtual objects relative 
to a local reference frame. However, the reliability of such tracking approaches in open environments is 
compromised due to (i) expanded spatial dimensions, (ii) the absence of readily available reference points, (iii) 
computational costs in large environments, and (iv) the dynamic nature of external spaces that undergoes frequent 
changes. Therefore, the AR registration approach in outdoor environments cannot solely rely on local reference 
systems but must necessarily be based on an absolute reference system, enabling the determination of the 
geographic pose of both the user and virtual objects (Cyrus et al., 2019; Ling et al., 2019; Marchand et al., 2016). 
To this end, hybrid AR registration approaches, based on the combined use of IMU and high-precision Global 
Navigation Satellite System (GNSS) such as the Real-Time Kinematics (RTK), have been pursued for the 
visualization of underground pipelines and subsurface data (Hansen et al., 2021; He et al., 2006; Roberts et al., 
2002), for urban navigation (Guarese & Maciel, 2019; Zhao et al., 2016), for agricultural vehicle navigation (Kaizu 
& Choi, 2012), and for the alignment of multiple smaller maps from an existing SLAM tracking system (Ling et 
al., 2019). Even from a commercial standpoint, there are currently not many solutions available that ensure the use 
of AR apps in outdoor environments either without relying on some kind of additional infrastructures (e.g., markers, 
QRcode, RFID, beacons, etc.) or without the need of manual/semi-manual alignment procedures, with some 
exceptions. For instance, Trimble Site Vision (Trimble Inc., 2023) makes use of the built-in GNSS receiver to 
achieve | centimeter of horizontal accuracy under RTK coverage. Similarly, Engineering-grade AR for AEC (vGIS 
Inc., 2023) developed by vGIS achieves the same centimeter-level accuracy under RTK coverage. In this case the 
RTK antenna is not directly integrated into the system but needs to be obtained from third-party vendors. However, 
relying on GNSS technology only means that the system cannot cope with urban-canyon scenarios and indoor 
environment. Due to these limitations and the persistently high costs, these solutions have not yet experienced 
widespread adoption in the construction industry. Delving deeper into this last notion, it is noteworthy that the 
individual components of RTK receivers are economically affordable, thereby fostering the proliferation of 
applications of this technology (Hansen et al., 2021). 


2.3 Research questions 


As demonstrated by the literature review reported in Sections 2.1 and 2.2, several AR registration methodologies 
exist. Limitations of existing indoor AR registration approaches must be considered. Marker-based approaches 
share the limitation of requiring a preliminary survey to install markers or the existence of signs to be used as 
natural markers (Baek et al., 2019). Among marker-less approaches, GNSS-based solutions are inappropriate for 
indoor applications because of the weakness of GNSS signals (Chen et al., 2019), whereas Wi-Fi-based approaches 
lose localization accuracy in case of multiple mobile devices (Salman & Ahmad, 2023). On the other hand, image- 
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based systems, although may lose accuracy whenever there is lack of context or repetitive elements, show great 
potential to develop indoor AR registration applications with applicability for non-expert users (Baek et al., 2019). 
With reference to outdoor AR registration approaches, limitations of marker-less GNSS-based approaches must be 
considered. First of all, the reduced reliability of GNSS in urban-canyon scenarios (e.g., proximity to urban 
elements, such as buildings, roofs, trees, and so on) limits possibilities of applications (Cheng et al., 2020; Ling et 
al., 2019). In addition, GNSS-based solutions are currently expensive (especially for high-precision RTK GNSS 
systems). Despite this, since single RTK receivers’ components are becoming available at affordable prices, the 
development of in-house devices is showing promising growth (Hansen et al., 2021). Finally, outdoor AR 
registration approaches are affected by the lack of integration to indoor scenarios, except through the use of 
additional supporting infrastructure (such as beacons) that constrain the deployment area (Cheng et al., 2020). 
Considering that indoor-outdoor interactions are crucial for managing the built environment, it would be important 
that AR applications can work seamlessly even during changes in environment. In addition, in order to ensure a 
wider applicability of AR, preliminary set up procedures for registration should be as simple as possible. In order 
to cover these gaps, this study aims to answer the following research questions: 


RQI What system architecture would ensure a seamless AR indoor/outdoor registration? 
RQ2 What technical solutions would make AR registration “plug-and-play” for wider applicability even 
among non-expert users? 


3. METHODOLOGY 
3.1 System architecture 


In order to answer the research questions (Section 2.3), the system architecture, reported in Fig. 1, has been defined. 
The proposed architecture is built on top of a BIM Cloud Platform which hosts the following four elements, each 
playing a crucial role in the overall functionality: (i) the Common Data Environment (CDE), (ii) the Outdoor AR 
Registration Engine, (iii) the Indoor AR Registration Engine, and (iv) the Switch Engine (Fig. 1). The BIM Cloud 
Platform serves as a centralized resource and processes hub that facilitates data processing, storage, and 
distribution. An important characteristic of the platform is its ability to host, localize, and align BIM models, 
images, and point clouds within a geospatial context. This geolocation feature enables the precise mapping of 
virtual assets and features to their corresponding real-world locations, facilitating the integration between the 
virtual and physical realms. One of the key responsibilities of the BIM Cloud Platform is therefore to manage the 
alignment processes. Particularly, the positioning of images within the platform is achieved by referencing the 
absolute world coordinates of the acquisition point, along with the accurate rotations. This process ensures that 
images (and point clouds) are precisely georeferenced and aligned to their real-world locations. This approach lays 
the foundation for understanding the subsequent paragraphs, which delve into the concept of “images in the vicinity 
of the user's position”. 


The CDE is responsible for structured (e.g., .ifc files) and unstructured (e.g., images) data storage. It facilitates 
accessibility to AR applications through dedicated clients. In this work, the CDE of the DICEA Department of the 
Universita Politecnica delle Marche has been used. At its core, a graph database provides a resilient backbone, 
offering efficient storage, retrieval, and traversal of interconnected data elements. The next integral components 
are the two distinct registration engines, specialized for outdoor and indoor environments, respectively. The 
Outdoor AR Registration Engine, which relies on the combination of RTK GNSS and IMU systems, is tailored to 
tackle the unique challenges presented in open spaces, such as dealing with the absence of reliable reference points 
and coping with large and dynamic environments. On the other hand, the Indoor AR Registration Engine is 
designed to excel in environments characterized by restricted access to GNSS signals, leveraging features like 
point clouds and aligned images to achieve accurate positioning. To this purpose, convolutional neural networks 
(CNN) (Sarlin, Cadena, et al., 2018; Sarlin, Debraine, et al., 2018) that simultaneously predict local features and 
global descriptors have been applied for accurate 6-DoF localization. Finally, the pivotal feature of this system lies 
in the Switch Engine, which effectively serves as an integrator between the outdoor and indoor registration engines. 
It is a rule-based engine that assesses the availability of either GNSS signals, or features, or both, and dynamically 
switches between the two registration approaches to maintain a consistent and uninterrupted AR experience. By 
synergistically combining the aforementioned four elements within the BIM Cloud Platform, the system 
architecture (Fig. 1) delivers a robust and marker-less AR system that can seamlessly adapt to both indoor and 
outdoor scenarios (i.e., answer to RQ1). Although the approach and methodology proposed in this paper are 
applicable to both head-mounted and hand-held AR devices, the focus from this point forward will be specifically 
on the usage of Microsoft’s AR tool, HoloLens2. To this end, a novel addition to the tool is introduced, developed, 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


and physically realized, enabling a robust connection between the HoloLens2 device and the RTK receiver for 
precise calibration between the systems. The presented system architecture has been implemented in an AR 
application for HoloLens2 developed using the C# programming language and the serious game engine Unity3D. 


BIM Cloud Platform 


Point Clouds and 


BIM models j 
referenced Images 


Outdoor AR Registration Engine Indoor AR Registration Engine 


Ge Ge 


Switch Engine 


World coordinates 


Fig. 1: Architecture of the proposed system for seamless outdoor and indoor AR registration. 
3.1.1 Outdoor AR Registration Engine 


Following the works made by Hansen et al. (2021) and Ling et al. (2019), the Outdoor AR Registration Engine 
proposed in this paper relies on the combination of a RTK GNSS tracking system and the HoloLens2’ built-in 
inertial tracking system. By aligning the local frame reference of the HoloLens2 device to global coordinates 
exploiting the RTK measurements and using the HoloLens2 capability to localize itself in the environment through 
a real time mapping service, a geographical SLAM algorithm can be developed in order to have an absolute 6- 
DoF localization of the AR device and therefore aligned virtual objects. Some considerations must be made: 


e a general 3D object in the BIM Cloud Platform has its own reference frame located in the world that is 
fully specified by its geographical coordinates and its orientation with respect to the North: Latitude (°), 
Longitude (°), Altitude (m), and Azimuth (°). Object’s coordinates refer to the WGS-84 standard; 

e the coordinates retrieved from the RTK system are based on the WGS-84 standard; 

e the RTK receiver has been developed in-house with contained costs in the perspective of widespread 
adoption; 

e the RTK receiver and the HoloLens2 must be solidly connected to each other; 

e the RTK system is reliable as long as the receiver is within 10 km of the RTK base station antenna; 

e the HoloLens?’ local coordinate frame originates at the point where the AR application is turned on. 


The problem of placing world-referenced 3D BIM objects into the HoloLens2 local frame can be achieved by 
fulfilling the following steps (Fig. 2): (i) aligning the local frame with the North direction, (ii) adjusting the object 
position, (iii) adjusting the object altitude, and (iv) placing objects based on the distance from the observer. The 
resulting Outdoor AR Registration Engine’s process (Fig. 2) is automatically executed (i.e., answer to RQ2). Hence, 
the outdoor registration process, not requiring any particular action from the user, can find applicability even 
among non-expert users. 


GET SAMPLES COMPUTE AZIMUTH ADJUST POSITION 
from: Compute azimuth to align AND ALTITUDE 


. RTK GNSS the local frame to the Update objects position 
. IMU North direction 


Fig. 2: A schematization of the outdoor AR registration engine’s processes. 


The first part of the outdoor engine involves the initialization phase of the HoloLens2 position. It includes 
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acquiring initial samples from both the RTK and IMU systems. Once initialized, the RTK system provides absolute 
3D coordinates of the body frame (Latitude 1, Longitude A, and Altitude h,), while the IMU provides for local 
3D coordinates (x, y, z) and rotations of the body frame. The local equirectangular projection must have y axis 
directed toward the North. However, at the beginning, the HoloLens2’ local frame has an arbitrary unknown 
orientation with respect to the North. To solve this, when moving between two positions, the performed segment 
has an orientation p’ with respect to the y’ axis but the same movement forms a bearing angle p} with respect to 
North as shown in Fig. 3. 


Fig. 3: Bearing angles with respect to North direction. 


Given latitude and longitude of the start point (91, 41) and latitude and longitude for the end point @2, A, of a 
straight line along a great-circle area, the initial bearing (sometimes referred to as forward azimuth) can be 
computed as follow: 


in(Az—A1)cos(~2) 
= atan2 (re vente) ____) 1 
P cos(p1)sin(p2)-sin(p1)cos(p2)cos(A2—21) (4) 


The geographical position of its reference frame (Po, 9) can be computed by inverting the reverse projection: 
Ap = 4' - — 


(2) 


Po = 9' -7 (3) 


Rcos@y 


assuming that pọ = Qı and where (¢~',/') are the GNSS coordinates and R is the radius of the globe. When the 
local frame position (90, o) and the object’s geographic coordinates (p”, A”) are known, the corresponding local 
planar coordinates can be computed by the forward projection: 


x = RA" —A,)cosp, (4) 
y = R(Q" — po) (5) 


When the observer moves too far from the origin, the local reference should be translated in the new position and 
the 3D objects must be positioned with respect to the new reference system by re-applying the previous equations. 
To track the true value of the observer altitude, each time a GNSS measure is acquired for the observer’s altitude 
h’, its height in the local coordinate system can be stored for later use as height of the origin of the local frame. 
Consequently, given the altitude of the local frame (hg), its height in local coordinates Zp) and the altitude of an 
object (h”), the corresponding vertical coordinate z of an object can be computed by: 


z = h” — ho + Zo (6) 


If the observer vertically moves the objects at z’ to match their true height from the ground, the resulting true 
altitude is computed and stored. When the observer moves too far from the origin, the local reference should be 
vertically translated in the new altitude and the 3D objects must be positioned with respect to the new reference 
system by re-applying the previous equations. 


3.1.2 Indoor AR Registration Engine 


The AR registration in indoor environments, which contrarily to the outdoor ones are affected by restricted GNSS 
signals, required a dedicated solution different from the one presented in Section 3.1.1. For this reason, an Indoor 
AR Registration Engine, based on image comparison with survey data of the analyzed environment, has been 
developed (Fig. 4). A preliminary on-site survey with a camera and LIDAR scanner (e.g., GeoSLAM ZEB 
Horizon) must be carried out in order to collect point cloud and aligned photos of the analyzed environment. 
Alternatively, point clouds can be generated from a photos collection using the incremental Structure-from-Motion 
methodology implemented by the COLMAP library (Schönberger & Frahm, 2016). The basic idea is 6-DoF 
localizing the HoloLens2 by comparing a frame from its current view (i.e., query image) with images referenced 
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to the point cloud of the analyzed environment (i.e., reference images). Both reference images and the point cloud 
are stored in the BIM Cloud Platform. In this study, the Hierarchical Feature Network (HF-Net) technology has 
been implemented for image comparison (Sarlin, Cadena, et al., 2018; Sarlin, Debraine, et al., 2018). HF-Net 
consists of a CNN able to simultaneously detect feature keypoints and compute local and global descriptors for 
accurate 6-DoF localization. The “hierarchical” attribute refers to the HF-Net feature close to the humans’ attitude 
of naturally localizing, in a previously visited environment, with a “from coarse to fine” approach. In other words, 
humans first localize themselves by looking at the global scene appearance and subsequently inferring an accurate 
location from a set of likely places using local visual clues. This means that for each HoloLens?2 registration call, 
a coarse search, consisting in a global-descriptors matching between the query image and the reference images, is 
performed. Afterwards, a finer search based on a local-descriptors matching between the query image’s 2D 
keypoints and the point cloud’s 3D points covisibile in reference images is executed. Finally, a 6 DoF pose 
estimation of the query image is carried out by solving the Perspective-n-Point (PnP) problem (Kneip et al., 2011). 
The estimated pose is thus the rotation and the translation vectors that allow transforming 3D points expressed in 
the world coordinate system into the camera coordinate system. These parameters enable the indoor AR 
registration of HoloLens2’s gaze. 


The presented Indoor AR Registration Engine’s process (Fig. 4), once the survey of the interested area is completed, 
can be automatically executed (1.e., answer to RQ2). Hence, the indoor registration process, not requiring the user 
to do any particular actions, can find applicability even among non-expert users. 


COMPARE IMAGES ESTIMATE POSE 
(HF-NET) (PNP) 


SURVEY THE AREA 


. Point cloud (directly or 
indirectly by SfM) e  Global-descriptor matching Estimate position and heading 
e Referenced photos e Local-descriptor matching for indoor AR registration 


Fig. 4: A schematization of the Indoor AR Registration Engine’s processes. 
3.1.3 The Switch Engine 


As illustrated in Fig. 1, the Switch Engine serves as a seamless integrator between the two types of registration, 
contributing to answer both RQ1 and RQ2. This component is a rule-based engine that enables seamless 
indoor/autdoor AR registration. The Switch Engine acts differently according to 3 possible scenarios: (i) RTK only 
(outdoor), (ii) RTK plus images/point clouds (outdoor), and (iii) images/point clouds only (indoor). In the first 
scenario, the Switch Engine identifies the availability of a stable and reliable RTK connection in outdoor scenarios. 
Consequently, the Switch Engine triggers the outdoor AR Registration Engine. The second scenario occurs when, 
in outdoor environments, images and point clouds are available simultaneously with RTK connection. The third 
scenario, instead, refers to indoor scenarios in which only images and point clouds are available. In both the second 
and third scenarios, as the Switch Engine identifies the presence of images and point clouds in the vicinity of the 
user’s real-world position (e.g., when approaching a previously surveyed building or asset), the system triggers 
the Indoor AR Registration Engine. At that point, the system entirely relies on images and point clouds for AR 
registration. 


4. EXPERIMENTS 
4.1 FM use case 


The methodology proposed in this study has been tested on a FM use case based on a university campus, assumed 
as case study. Specifically, the study focused on the FM of the Digital Construction Capability Centre (DC3) Lab 
at the Universita Politecnica delle Marche (Fig. 5 (a)). The DC3 Lab, which covers an area as large as 240 m?, is 
composed of a main open space, a changing room, an office, and the restroom. Within this context, the management 
of the electrical system, and in particular of the internal electrical panel of the DC3 Lab, has been considered. 
During this activity, the technician in charge of FM operations spends time first locating the electrical panel. Then, 
in order to find the root cause of the problem, the technician may be asked to locate the panel’s associated cabling, 
which extends externally to the building. These cables can be accessed through manholes located on the road in 
front of the building (Fig. 5 (a)). Once located all the elements interested by FM operations, the technician may 
need technical information about the electrical system. To this purpose, he/she needs to access the as-built BIM 
model (Fig. 5 (b)). The implementation of the proposed methodology (Section 3) enables the seamless AR 
registration in heterogeneous indoor/outdoor scenarios. Testing the proposed system to the presented use case, its 
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applicability in real-life situations will be assessed. As described more in detail in Section 4.2, the experimentation 
primarily revolves around the utilization of an AR application implemented on the HoloLens2 device. 


Fig. 5: (a) Aerial view of the university campus, assumed as case study, identifying the positions of the DC3 Lab 
(i.e., red placemark), the manhole cover (i.e., blue placemark), and the RTK base antenna (i.e., green placemark); 
(b) view of the BIM model of the DC3 Lab identifying the positions of the indoor electrical panel (i.e., red 
placemark) and the outdoor manhole cover (i.e., blue placemark). 


4.2 Experiments design and execution 


The developed system has been tested on the selected use case (Section 4.1) following the steps summarized in 
this section. First of all, a preliminary set-up phase consisting in collecting and initializing input data has been 
executed. This phase must be executed only once since related input and settings are maintained. It includes: 


1. collecting point clouds and aligned photos by carrying out a survey of the DC3 Lab. In this study, the survey 
has been carried out by using GeoSLAM Backpack Vision (Fig. 6 (a)), which collects simultaneously both 
point clouds and aligned photos with a single scanning (Fig. 6 (b)). Alternatively, the survey can be carried 
out by collecting photos (e.g., with a smartphone) and then generating a point cloud through the Structure- 
from-Motion methodology implemented by the COLMAP library (Schönberger & Frahm, 2016); 


2. collecting the BIM model of the DC3 Lab (Fig. 5 (b)); 


3. uploading the BIM model, point cloud, and images, related to the selected use case, on the BIM cloud platform. 
It must be noted that point cloud and images result aligned directly from the survey. The BIM model must be 
aligned to the previous dataset by selected reference points; it must be noted that this alignment is executed 
only once since it is maintained. 


Once the previous preliminary steps are completed, the AR-based inspection of the electrical system related to the 
selected use case can start. In this study, the head-mounted AR device Microsoft HoloLens2 has been used (Fig. 6 
(c)). The AR application, based on the system architecture reported in Fig. 1, has been developed to support the 
technician inspecting the electrical system distributed in a heterogeneous indoor/outdoor scenario. The main 
contribution of the proposed system is the marker-less AR registration for displaying BIM models seamlessly 
superimposed to the whole inspected environment. The following steps have been executed on-site: 


4. having on the HoloLens2 with installed RTK receiver (Fig. 6 (c)) and launching the AR application; 

5. moving around the campus of the Faculty of Engineering at Universita Politecnica delle Marche. During this 
preliminary step outside the DC3 Lab, the Outdoor AR Registration Engine is triggered in order to localize 
the user and drive him/her to the DC3 Lab; 

6. heading to the internal electrical infrastructure to inspect the electrical panel located inside the DC3 Lab (Fig. 
5 (a)). As the user moves from the outdoor to the indoor, the GNSS coverage decreases and the collected 
dataset (i.e., point clouds and aligned images) is found in the surrounding of the user position. Hence, the 
system seamlessly switches to the Indoor AR Registration Engine. This transition occurs without interruption 
as the system switches between registration modes through the Switch Engine’s algorithm; 

7. inspecting the internal electrical infrastructure, specifically focusing on the indoor electrical panel. During the 
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indoor phase, the AR application relies on the Indoor AR Registration Engine. It superimposes the digital 
model of the electrical panel on the real asset to let the facility managers have all the required information 
from the BIM model for the inspection (Fig. 7 (a)); 

8. heading to the external electrical infrastructure to inspect the cablings associated to the internal electrical panel. 
As the user moves from the indoor to the outdoor, the GNSS coverage rises again and the system seamlessly 
switches to the Outdoor AR Registration Engine. This transition occurs without interruption as the system 
switches between registration modes through the Switch Engine’s algorithm; 

9. inspecting the external electrical infrastructure, specifically focusing on the manhole covers located on the 
street facing the building (Fig. 5 (b)). During the outdoor phase, the AR application relies on the Outdoor AR 
Registration Engine, leveraging geospatial data retrieved from the RTK receiver to overlay BIM data on the 


real asset (Fig. 7 (b)). 
| 
a 
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Fig. 6: (a) Survey phase of the DC3 Lab by using GeoSLAM Backpack Vision for collecting (b) point clouds 
and aligned photos; (c) inspection phase with the Microsoft HoloLens2 and the RTK receiver integrated by a 3D 
printed add-on. 


(a) | b) 
Fig. 7: Visualization of the aligned holograms of (a) the indoor electrical panel and (b) the outdoor manhole 
cover through the AR application deployed on Hololens2. 


5. DISCUSSION AND CONCLUSIONS 


This paper addresses the open issue of AR registration in mixed scenarios considering that indoor-outdoor 
interactions are crucial for built environment management. An extended literature review has found limitations of 
existing indoor and outdoor AR registration approaches, drawing the conclusion that an all-in-one solution 
conceived for heterogeneous scenarios do not exist yet. Hence, this work focuses on defining and implementing a 
system architecture for seamless AR registration even during changes in environment (i.e., RQ1). In doing this, 
technical solutions that would make AR registration applicable even among non-expert users have been considered 
(i.e., RQ2). In order to answer RQ1, a system architecture (Section 3.1), which delivers a robust and marker-less 
AR registration system for both indoor and outdoor scenarios, has been defined and implemented. The resulting 
system has been tested on-site on a FM use case (Section 4.1). The proposed marker-less localization system has 
been put in place considering the FM of a university laboratory’s electrical system. It has been selected because 
in-charge technicians and facility managers are continuously asked to locate, inspect, and repair interrelated system 
elements distributed in a heterogenous indoor/outdoor environment. Efficiency of such activities is expected to be 
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considerably improved by accessing BIM models directly superimposed to their physical counterparts. In this 
study, the head-mounted AR device Microsoft HoloLens2 has been used for 6DoF localization and seamless gaze 
registration in heterogeneous environments. Indoor inspections of the electrical system have specifically focused 
on the laboratory electrical panel, whereas the outdoor ones on the related cablings that can be inspected through 
a dedicated manhole cover. The proposed system has shown promising results in registering the BIM model on- 
site. In fact, the holograms of the electrical panel and the manhole cover resulted superimposed to their physical 
counterpart even if they are located respectively in indoor and outdoor environments. This has confirmed one of 
the main contributions of the proposed system, that is a marker-less AR registration for displaying BIM models 
seamlessly superimposed to the whole inspected environment. In order to ensure a user-friendly AR registration 
process in mixed environments, hence providing an answer to RQ2, the proposed system requires only a 
preliminary set-up phase (i.e., steps 1-3 in Section 4.2) whose settings are then maintained. In fact, as confirmed 
by experiments, the AR experience is supported by the Switch Engine that manages priorities of the indoor and 
outdoor AR registration engines. In areas with GNSS coverage (i.e., generally outdoor environment), the Switch 
Engine delegates holograms superimpositions to the Outdoor AR Registration Engine. This is the logic that 
regulates the superimposition of the external manhole cover hologram to its physical counterpart. As the GNSS 
coverage is lost and collected datasets (i.e., point clouds and aligned images) are found in the surrounding of the 
user position (i.e., generally indoor environments), the Switch Engine delegates holograms superimpositions to 
the Indoor AR Registration Engine. This is the logic behind the superimposition of the internal electrical panel. As 
a result, since AR registration is automatically fulfilled in heterogeneous environments, the proposed solution can 
find applicability even among non-expert users. 


Current limitations of the proposed methodology can be traced back to technologies adopted by system’s engines. 
Accuracy loss may affect image comparison, adopted by the Indoor AR Registration Engine, in case of reference 
and/or query images with lack of context or repetitive elements. Follow-up studies may quantify such limitation. 
On the other hand, the Outdoor AR Registration Engine strongly relies on the availability of RTK coverage. 
Despite such technology is currently economically expensive, it is noteworthy that since the individual components 
of RTK systems are economically affordable, the proliferation of applications of this technology is highly expected. 
Further studies will be carried out in order to assess registration accuracy of both the Indoor and Outdoor AR 
Registration Engine. More tests must be carried out in order to optimize switching thresholds based on GNSS 
coverage and datasets (i.e., point clouds and aligned images) availability in the surrounding of the user position. 
Future developments will focus also on the definition of a graphical user interface for better managing the entire 
AR registration workflow. Finally, the proposed system will be provided to non-expert users in order to quantify 
its contribution in terms of saved time for completing a task. 
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